Data in Brief — Latest Matching Preprints

1

Safety Transparency in Animal Cell-Cultured Ingredients for Pet Food: A Case Study Establishing the Standard for Public Disclosure

Tewari, R.; Soukup, R.; Hadjistylianou, L.; Manicone, M.; Serra, M.; Felbermair, M.; Falconer, S.

2026-07-15 cell biology 10.64898/2026.07.14.738473 medRxiv

Top 0.1%

1.4%

Show abstract

Animal cell-cultured ingredients are entering the EU and UK pet food markets under frameworks that do not require pre-market, ingredient-level safety assessments, creating an ethical need for transparent safety disclosure. We present the first public safety dossier for this sector, describing the proprietary mouse embryonic stem cell line PE25 and its derived, non-viable cellular and conditioned media ingredient produced in food and feed-grade media. PE25 characterization confirmed Mus musculus identity, sterility, absence of mycoplasma and replication-competent retroviruses, and stable growth. Doxorubicin-induced p53 stress testing, CD44/BMI1 profiling, and soft agar assays showed no cancer-like traits and a non-tumorigenic profile; the final ingredient contains no viable cells. Independent OECD TG 471 and 487 assays confirmed non-genotoxicity. Heavy metals, biogenic amines, solvents, and chemical residues were below regulatory limits. Given process variability, we recommend case-by-case safety evaluation and propose this dossier as a model for responsible commercialization.

2

Benchmarking Speech Recognition Models for Medical Consultations in Latin American Spanish: A Comparative Evaluation with Fine-Tuning

Carrillo, R. M.; Carbajal Serrano, A.; Condori Pinedo, P. S.

2026-07-16 public and global health 10.64898/2026.07.14.26358062 medRxiv

Top 1.0%

0.2%

Show abstract

BACKGROUND: Artificial intelligence (AI) medical scribes rely on speech-to-text (STT) models for transcription. Evaluations of STT models in non-English settings remain scarce. We benchmarked ten STT models on medical consultations from Latin American (LatAm) Spanish and assessed whether fine-tuning improves transcription accuracy. METHODS: Ten YouTube videos depicting medical consultations. Human transcriptions were the ground truth. Five open-source models were evaluated: Whisper Large, Whisper Large v3, Whisper Large v3 Turbo, Voxtral Mini 3B, and Canary 1B v2; and so were five close-source models: gpt-4o-transcribe, gpt-4o-mini-transcribe, gemini-2.5-pro, Eleven Labs, and Assembly AI. Whisper Large v3 was fine-tuned. One video was withheld from training. Performance assessed using Word Error Rate (WER), Character Error Rate (CER), BLEU Score, ROUGE-L, BERT Score, and Semantic Similarity on the one withheld video. RESULTS: None of the fine-tuning iterations outperformed the vanilla Whisper Large v3. With the withheld video, Gemini-2.5-pro was the close-source model with the best performance in four of six metrics. In comparison to the close-source models, the fine-tuned model never outperformed the other models (withheld video); conversely, in comparison to the close-source models, the fine-tuned model showed better performance across metrics, for instance: BLEU score (63% vs to 58% for the second-ranking model), BERT (89% vs to 86%), and semantic similarity (89% vs to 83%), CER (19% vs 20%). CONCLUSIONS: Whisper Large v3 and its fine-tuned variant are the best open-source STT models for transcribing medical conversations in LatAm Spanish. These findings provide an evidence base for developing AI medical scribes tailored to Spanish-speaking LatAm.

3

Critically Ill Children Frequently Receive Medications with Established but Unused Pharmacogenomic Guidelines: Actionable Findings from an Integrated Electronic Medical Record and Exome Sequencing Study

Lynch, N.; Elefant, N.; Revah-Politi, A.; Geneslaw, A. S.; Beckett, J.; Wall, J. B.; Aguilar Breton, C.; Sabatello, M.; Kernie, S. G.; Bayir, H.; Gharavi, A. G.; Motelow, J. E.

2026-07-20 genetic and genomic medicine 10.64898/2026.07.16.26358240 medRxiv

Top 2%

0.1%

Show abstract

Importance Pharmacogenomic (PGx) guidelines can improve medication efficacy and reduce toxicity, but their application in pediatric intensive care units (PICUs) remains largely unexplored. Objective To determine the frequency of medications with established PGx guidelines administered in the PICU and assess the capacity of exome sequencing to capture PGx phenotypes for these medications. Design Retrospective cohort study integrating electronic medical record and exome sequencing data. Setting Morgan Stanley Children's Hospital of NewYork-Presbyterian, a single center tertiary care children's hospital. Participants A total of 4,939 children admitted to the PICU (2020 - 2024), and 192 children admitted to the PICU who underwent exome sequencing for research purposes (2015 - 2023). Exposure Critical illness requiring PICU admission. Main Outcomes and Measures Frequencies of administration of medications with established PGx guidelines in the PICU and the proportion of individuals with exome sequencing with identifiable PGx phenotypes. Results Among 4,939 PICU patients, 37.2% (n=1,837) received at least one medication with established PGx guidelines and 14.4% (n=712) received two or more such medications. Twenty PGx genes were implicated; CYP2C9 was most common (17.3%, n=853). An estimated 8.2% of patients received medications for which PGx-guided recommendations would have altered clinical management. Among 192 patients who underwent exome sequencing, at least one metabolizer phenotype was identified in 62% (n=119). Conclusions and Relevance Many critically ill children receive medications with established PGx guidelines. This study highlights an opportunity for more personalized medicine for critically ill children admitted to a tertiary care hospital and assesses the strengths and weaknesses of exome sequencing to uncover pertinent PGx phenotypes.

4

Acute associations between ambient air pollution and risks of preterm and early-term births: results from 8 states in the United States

Zheng, X.; Fitch, A.; Warren, J. L.; Hao, H.; Strickland, M. J.; Newman, A. J.; Darrow, L. A.; Chang, H. H.

2026-07-21 epidemiology 10.64898/2026.07.19.26358434 medRxiv

Top 2%

0.1%

Show abstract

Exposure to higher levels of ambient air pollution during pregnancy has been linked to multiple adverse pregnancy outcomes. However, studies on acute exposures and reduced gestation length have reported inconsistent findings. This project aims to examine the acute association between ambient air pollution and preterm (28-36 gestational weeks) or early-term (37-38 gestational weeks) births. Daily concentrations of 12 air pollutants, based on bias-corrected numerical model outputs, were linked to vital records of singleton live preterm and early-term births from 2005-2017 in California, Florida, Georgia, Kansas, Nevada, New Jersey, North Carolina (2005-2015), and Oregon. Under a time-stratified case-crossover design, odds ratios (OR) were estimated via conditional logistic regression with adjustment for risks among ongoing pregnancies, meteorology, time trends and federal holidays. We estimated cumulative associations up to a 6-day lag using distributed lag models. Risk estimates per interquartile range (IQR) increase in exposure were pooled across states using inverse-variance weighting. Our study included 1,085,162 preterm and 3,901,185 early-term births. We observed positive associations between 0-2 day cumulate exposure to several air pollutants and early-term births, including NO2 (OR: 1.0023, 95% CI:1.0010, 1.0037 per 7.1 g/m3 increase), PM2.5 (OR=1.0022, 95% CI: 1.006, 1.0038 per 4.6 g/m3 increase), PM2.5 organic carbon (OR= 1.0026, 95% CI: 1.0013, 1.0039 per 1.7 g/m3 increase) and PM2.5 elemental carbon (OR=1.0025, 95% CI: 1.0014, 1.0035 per 0.26 g/m3 increase). Associations with preterm birth were mostly null. In conclusion, we found positive associations between short-term air pollution exposure, including PM and major PM2.5 components, and risks of early-term birth.

5

Trends and Future Burden of Major Gastrointestinal Cancers in Jiangsu Province, China, 2010-2030

Zou, Y.; Wang, W.; Tao, L.; Zhu, H.; Ju, H.; Pan, L.; Wang, W.

2026-07-17 public and global health 10.64898/2026.07.16.26358207 medRxiv

Top 3%

0.1%

Show abstract

Aim: To assess temporal trends in incidence and mortality and project the future burden of five major gastrointestinal cancers in Jiangsu Province, China. Methods: Population-based cancer registry data from Jiangsu Province between 2010 and 2021 were used to analyze the burden of esophageal, gastric, colon, rectal, and liver cancers. Age-standardized incidence and mortality rates were calculated and compared by cancer type, sex, and urban-rural residence. Joinpoint regression was used to estimate annual percentage changes (APC) and average annual percentage changes (AAPC). The APC from the most recent Joinpoint segment was used to project incidence and mortality rates to 2030. Results: In 2021, gastric cancer had the highest age-standardized incidence and mortality among the five cancers. Incidence and mortality were consistently higher in males than in females and increased markedly after 50 years of age. From 2010 to 2021, age-standardized incidence and mortality declined for esophageal, gastric, and liver cancer, but increased for colon and rectal cancer. Colon cancer showed the steepest increase in both incidence and mortality. Rural areas experienced faster increases in colon and rectal cancer burden than urban areas. Projections to 2030 suggest continued declines in esophageal, gastric, and liver cancer, while colon cancer incidence and mortality are expected to rise further. Conclusion: Jiangsu Province is experiencing a transition in gastrointestinal cancer burden, with continued declines in esophageal, gastric, and liver cancers but an emerging and growing burden of colorectal cancer, especially colon cancer. Prevention strategies should focus on expanding colorectal cancer screening and early diagnosis, particularly in rural areas, while sustaining control of esophageal, gastric, and liver cancers.

6

LocusBlend: Flexible multi-index regional visualization of genomic association signals

yang, c.; Cook, N.; Zeng, Y.; Fu, T.; budde, J.; Cruchaga, C.; Belloy, M. E.

2026-07-21 genetic and genomic medicine 10.64898/2026.07.15.26358129 medRxiv

Top 3%

0.1%

Show abstract

Summary It has become standard practice to visualize regional signals from genomewide association studies GWAS using LocusZoom plots Similarly GWAS signals are compared to regionally matched quantitative trait loci QTLs ie varianttogene regulation data using LocusCompare plots to aid assessment of candidate traitrelated genes Despite broad usage these tools annotate variants by linkage disequilibrium LD to a single lead or index variant This singleindex representation has limitations for visualizing complex loci that contain multiple independent signals We present LocusBlend an interactive web application for multiindex LDblended visualization of genomic loci LocusBlend supports one or two genomic association summarystatistic datasets and one to three index variants multiindex LocusZoom colorblended plots and matching LocusCompare visualizations Applications to Alzheimers disease GWAS and QTL signals illustrate LocusBlend enables visualization and separation of independent signals despite shared LD and high genomic complexity Overall LocusBlend is aimed at supporting researchers handle the continuously expanding complexity of human genomics findings Availability and Implementation LocusBlend is freely available at httpslocusblendwustledu Publication ready plots are generated in 1min Source code documentation example datasets input templates and reproducibility instructions are available at httpsgithubcomBelloyLabLocusBlend LocusBlend is implemented in Python using Streamlit Plotly and PLINK Supplementary Information Supplementary data are available online

7

The Registry of Pregnant Women at Cruces University Hospital: an ethical framework for prospective research with preanalytical optimization of maternal plasma processing

Gonzalez-Moro, I.; Sanchez-Garcia, H.; Medina Cuesta, T.; Rodriguez Lirio, A.; Espin Lopez, M. d. P.; Esquivel Gonzalez, S.; Quintana Ochoa de Alda, E.; de la Pena-Sanz, M.; Marin Cano, L.; Sarasua-Blanco, N.; Ortiz Salinas, P.; Sanfeliu Padulles, A.; Ruiz Adrian, A.; Martinez Isidoro, A.; Aldaiturriaga Otaola, A.; Aramburu Gil, A.; Garcia Gil, A.; Saenz Saenz, A.; Heredia Campos, A.; Fernandez Salado, A.; Ramirez Jarana, A. I.; Tobar Lopez, A. I.; Casarojos Oses, A. J.; Martinez de Maranon Toral, A.; Satiago Hidalgo, A.; Silva Diaz, A.; Basterrechea Miguel, A.; Castanos Lasa, A.; Esteras Vadi

2026-07-17 obstetrics and gynecology 10.64898/2026.07.17.26357942 medRxiv

Top 4%

0.1%

Show abstract

Background: Prospective pregnancy registries and biobanking infrastructures are essential for future translational studies investigating maternal, placental and offspring health. However, circulating nucleic acid analyses are highly sensitive to preanalytical variability, particularly regarding blood-collection tube type and sample processing conditions. We established a prospective pregnancy registry and biobanking workflow at Cruces University Hospital and evaluated the impact of preanalytical variables on circulating cell-free DNA (cfDNA) and cell-free RNA (cfRNA) preservation in maternal plasma collected at delivery. Methods: The Registry of Pregnant Women at Cruces University Hospital was designed as a prospective infrastructure integrating placental sampling, maternal blood collection and ethically controlled future access to maternal and offspring clinical data. Within this framework, peripheral blood samples from 50 women at delivery were simultaneously collected into EDTA, Norgen and Roche tubes. Plasma samples processed within or after 24 hours following collection underwent cfDNA/cfRNA extraction, electrophoretic profiling, fluorometric quantification and RT-qPCR analyses targeting different stress-related genes. Results: By the end of June 2026, 1,127 women had been prospectively recruited into the registry, with 661 plasma samples, 637 serum samples and 858 sets of four placental biopsies collected, processed and stored in the Basque Biobank. In the preanalytical substudy, EDTA tubes yielded higher cfDNA concentrations, likely reflecting reduced cellular preservation and genomic DNA contamination. In contrast, Roche tubes showed superior cfRNA preservation, with higher cfRNA concentrations and more consistent detection of the characteristic 5S rRNA peak compared with EDTA and Norgen tubes. Processing delays beyond 24 hours reduced cfRNA concentration, while associations between circulating transcripts and gestational age were more consistently detectable in preservative-containing tubes. Conclusions: Prospective infrastructures like ours offer strong foundation for large scale, long-term studies in the framework of the Developmental Origins of Health and Disease hypothesis. Technically, Roche tubes provided superior cfRNA preservation and enhanced sensitivity for detecting subtle biological associations, supporting the importance of standardized preanalytical workflows within prospective pregnancy biobanking resource.

8

Feasibility of using automatically extracted routine clinical data in a respiratory cohort study: The SPHN-SPAC demonstrator project.

Romero, F.; Sasaki, M.; Mallet, M. C.; Pedersen, E. S. L.; Leuenberger, L. M.; Makhoul, R.; Bovermann, X.; Hartung, A.; Latzin, P.; Kissling, S.; Moeller, A.; Treis, A.; Regamey, N.; Belle, F. N.; Kuehni, C. E.

2026-07-16 epidemiology 10.64898/2026.07.14.26357927 medRxiv

Top 4%

0.1%

Show abstract

Objectives To assess the feasibility of using clinical data automatically extracted via the Swiss Personalized Health Network (SPHN) to complement or replace manually abstracted clinical data in the Swiss Paediatric Airway Cohort (SPAC). Materials and Methods We studied 1,075 SPAC participants enrolled between 2017-2023 at two Swiss children's hospitals. Clinical data were extracted from electronic health records via SPHN in Resource Description Framework format, transformed into visit-centered datasets, and compared with manually abstracted SPAC clinical data and parent-reported emergency department (ED) visits and hospitalizations from follow-up questionnaires. We assessed feasibility by identifying challenges in acquiring data and evaluated data quantity, completeness, and agreement between datasets. Results We obtained analysis-ready SPHN-derived datasets from two hospitals after 24 months. SPHN-derived data captured more pneumology outpatient visits than manual abstraction (Hospital A: 1,963 vs 1,049; Hospital B: 2,343 vs 1,010) and identified clinical events among children without follow-up questionnaires. Completeness of variables varied across hospitals and encounters, reflecting differences in local clinical documentation practices. SPHN-derived and manually abstracted data showed high agreement for structured clinical variables, including spirometry measurements (concordance correlation coefficient >0.99). Self-reported and SPHN-derived ED visits and hospitalizations showed high absolute agreement but moderate concordance. Discussion and Conclusion Automated extraction of routine clinical data increased the completeness of longitudinal information compared with manual abstraction, suggesting that SPHN-derived data can complement manual data collection in cohort studies. Broader use remains limited by heterogeneous clinical documentation practices and the substantial effort required to harmonize and transform extracted data into analysis-ready research datasets.

9

Genetic Counselor Utilization Across Non-Genetics Departments for Neurodevelopmental Disorders

Cole, J. J.; Cohen, J. S.; Sahin, M.; Srivastava, S.; Campbell, C. A.

2026-07-21 genetic and genomic medicine 10.64898/2026.07.20.26358492 medRxiv

Top 4%

0.1%

Show abstract

IMPORTANCE: Most United States children with neurodevelopmental disorders have not received genetic testing aligned with current guidelines. Integration of genetic counselors into non-genetics departments is a potential strategy to improve uptake, but prevalence and details of integrated care models are unknown. OBJECTIVE: To characterize availability, utilization, and perceived need for genetic counselors across non-genetics departments caring for patients with neurodevelopmental disorders DESIGN: Cross-sectional observational department-level survey SETTING: Child neurology, adult neurology, developmental pediatrics, child psychiatry, and adult psychiatry departments at Intellectual and Developmental Disabilities Research Centers PARTICIPANTS: The survey was distributed to 67 departments across 15 institutions. The departmental response rate was 52% (35/67), with at least one response from 87% (13/15) of institutions. EXPOSURE: Presence/absence of dedicated genetic counselor(s), where "dedicated" was defined as hired by the department MAIN OUTCOME(S) AND MEASURE(S): This was a descriptive study only, with no comparative statistical analyses due to the exploratory nature. RESULTS: One third of departments (34%; 12/35) reported having dedicated clinical genetic counselors. Prevalence was highest in child neurology (67%; 8/12), followed by adult neurology (40%; 2/5) and developmental pediatrics (22%; 2/9), with none in child psychiatry (0/7) or adult psychiatry (0/2). In almost all departments with genetic counselors (92%; 11/12), they directly billed for their services, which universally included pre-test counseling/consent and post-test counseling. In departments without genetic counselors, only 39% (9/23) reported providers ordered their own genetic testing. Among all departments, over half (57%) were interested in adding/increasing genetic counseling support, while 26% were unsure and 17% uninterested. Insufficient funding was the most cited barrier; only one department reported insufficient need. CONCLUSIONS AND RELEVANCE: Though currently implemented in only one third of departments, our findings suggest those with dedicated genetic counselors directly pursue genetic testing (without referring to genetics) more than those without genetic counselors. Interest in increasing or adding genetic counseling support was high, and though funding was a reported barrier, feasible funding models were described. In the context of limited medical geneticists and expanding precision therapies, alternate delivery models for neurodevelopmental genetic testing including genetic counselor integration in non-genetics departments may help to scale and sustain uptake.

10

Quantifying the global burden of lead exposure from dietary lead intake

Kinally, C.; Hu, H.; Fuller, R.

2026-07-21 occupational and environmental health 10.64898/2026.07.20.26358457 medRxiv

Top 4%

0.1%

Show abstract

Background: Lead exposure is estimated to cause approximately 3.5 million premature deaths a year, yet the key ongoing sources of lead exposure are unclear. Methods: We estimated the contribution of dietary lead intake to global blood lead levels (BLLs) for 7-year-old children and 22-year-old adults by applying the All-Ages Lead Model (AALM) to calculate blood lead levels (BLLs) based on 25 total diet studies (TDS) that quantify dietary lead intake across 46 countries. Results: For children, the population-weighted average dietary lead intake in low- and middle-income countries (LMICs) (32.0 g/day) was found to be more than three times higher than in high-income countries (HICs) (9.3 g/day), and more than 10 times higher than the FDA reference level for children (2.2 g/day). The average impact on BLLs for children is estimated to be near 29 g/L in LMICs and near 12 g/L in HICs. Averaged across the TDS data, vegetables (27%) and cereals (24%) were found to contribute the most to dietary lead. Conclusions: While there are limitations associated with biokinetic modelling and the TDS data from LMICs, these results suggest that the contribution of dietary lead intake to global lead exposure is in the region of 40 to 50%, suggesting, in turn, that dietary lead intake is likely a major global driver of lead poisoning. Lead absorbed from the environment into food crops is expected to be the key driver of dietary lead. Current regulatory levels for maximum lead concentrations in foods (0.05-0.3 mg/kg) are out-of-date and may imply a dietary lead intake of 200 g/day, far higher than the FDA reference level (2.2 g/day). Collecting representative TDS data in high lead burden countries should be a priority. Further research is also recommended on upstream lead sources and pathways of lead uptake in plants, driving global food contamination.

11

The Swiss Integrated Care (INCA) Study: Description of a Novel Prospective Cohort of Patients and Caregivers in Reimbursed Informal Care

Nittas, V.; Heiniger, S.; Haag, C.; Frei, A.; von Wyl, V.; Aebersold, H.; Eid Madkour, M.; Hellmann, A.; Puhan, M. A.

2026-07-15 public and global health 10.64898/2026.07.14.26358029 medRxiv

Top 5%

0.1%

Show abstract

Methods INCA is a prospective, single-center cohort study with nationwide recruitment. Participation is open to adult patients and informal caregivers who provide paid informal care through home care agencies (Spitex organizations) in Switzerland. Eligible participants are enrolled consecutively. The cohorts primary outcome is health-related quality of life of patients, assessed monthly through patient-reported outcome measures. Secondary outcomes include home care needs, including the overall health and well-being of patients (measured semi-annually), the type, amount, and quality of care (recorded daily), and caregiver burden and resilience (measured quarterly). Additional analysis will include structured medical data, extracted from patient-provided documents using Optical Character Recognition (OCR) technology and analysed using Large Language Models (LLM). Results Since recruiting started in July 2025, the cohort has enrolled 855 patients and 851 caregivers. Among patients, 53% are female, with a median age of 73 years. Caregivers are predominantly female (72%) with a median age of 56 years. Most patients experience impairments in physical functioning and participation in social roles. Among them, 85% require less than two hours of care per day, though care needs vary considerably. This is further reflected in the multi-attribute utility, where the overall PROMIS-Preference (PROPr) score is very low for most patients, with a median of 0.102, indicating a substantial need for medical assistance and care. With a median of 9 comorbidities, health-related quality of life is overall low for most patients. Cardiovascular and endocrine & metabolic diseases are amongst the most prevalent, affecting 69% and 65% of patients with available diagnoses (n=771). Certain diagnostic pairs occur more frequently than expected by chance, suggesting underlying links between disease categories. Conclusions INCA responds to a growing policy need for robust evidence on how new models of informal care are associated with the health and well-being of patients and their caregivers. Its longitudinal design, combining patient- and caregiver-reported data with medical records and innovative data extraction methods, will lay the groundwork for a better understanding of new informal care models in real-world settings. INCAs findings are expected to have significant policy relevance and contribute to evidence-based policies on long-term care at home in Switzerland.

12

Beyond climatic drought indices : an hydraulic approach to quantifying forest water stress

Cochard, H.

2026-07-15 plant biology 10.64898/2026.07.13.738371 medRxiv

Top 5%

0.1%

Show abstract

The article introduces a new Forest Stress Index (ISF) based on a plant hydraulic modelling approach rather than classical climatic drought indices. Unlike other index like scPDSI or SPEI, ISF is grounded in xylem embolism dynamics simulated with the mechanistic SurEau model. The goal is to better link climatic anomalies to tree physiological functioning and mortality risk. ISF is defined using a locally adapted ideotype characterized by an optimal P50 value under a reference hydraulic functioning threshold. Simulations are performed across Europe and France using multiple climate datasets. The index is robust to model parameterization choices and assumptions about plant functional traits. Results show strong spatial and temporal consistency and significant correlations with SPEI and scPDSI. However, ISF more strongly highlights extreme drought years and exhibits a more skewed distribution. Future projections under SSP5-8.5 indicate a widespread increase in hydraulic stress with strong regional contrasts. Overall, ISF provides a mechanistic and complementary drought indicator more directly linked to forest mortality processes.

13

Developing a Global Framework for Digital Health in Traumatic Brain Injury (TBI): Clinician Perspectives of the Use of Digital Technologies in the TBI Care Pathway

Mantle, O.; Smith, B. G.; Whiffin, C.; Hobbs, L.; Penmetcha, V.; Menon, A.; Venturini, S.; Bashford, T.; Hutchinson, P. J.

2026-07-20 public and global health 10.64898/2026.07.17.26358327 medRxiv

Top 5%

0.1%

Show abstract

Background Traumatic brain injury (TBI) affects 69 million individuals globally each year, yet care remains fragmented across complex, multi-specialty pathways and settings. Digital health technologies offer potential to bridge care gaps, particularly in resource-limited settings, yet existing frameworks do not adequately address the complexities of the TBI care pathway or the diverse global contexts in which care occurs. Methods A cross-sectional qualitative study using critical realist-informed thematic analysis was conducted with practising neurosurgeons recruited internationally via National Institute for Health and Care Global Health Research Group on Acquired Brain and Spine Injury (NIHR ABSI) collaborating centres, social media, and society newsletters. Semi-structured interviews were conducted by a single researcher (OM) via Microsoft Teams (March-July 2024), exploring technology availability, healthcare infrastructure, clinical pathways, and contextual challenges, with a systems thinking approach guiding identification of current and potential technology integration points. Fourteen neurosurgeons from twelve countries participated, representing six lower-middle, two upper-middle, and four high-income countries. Results Six inductive themes emerged: Availability, Acceptability, Applicability, Capability, Feasibility, and Possibility- forming a novel conceptual framework visualised as a hexagonal chart for guiding digital health technology design and implementation in TBI care. Marked disparities in technology availability and utilisation were identified across urban/rural settings and income levels. Conclusions This framework offers a practical, context-sensitive tool for researchers, policymakers, and clinicians developing or implementing digital health technologies in TBI care globally. Visualisation in a similar style to a radar-chart enables simultaneous consideration of factors- including digital literacy, infrastructure, and cultural attitudes- whose neglect frequently underlies implementation failures.

14

Chronicle of a Death Foretold: The Ecological Collapse of Sakumo Lagoon, Ghana

Akwetey, M. F. A.; Lamptey, E.; Abrokwah, S.; Aheto, D. W.; Mensah, P. K.; Okyere, I.; Akintola, S. L.; Pauly, D.

2026-07-15 ecology 10.64898/2026.07.14.738434 medRxiv

Top 5%

0.1%

Show abstract

Sakumo Lagoon, a small (1 km2) semi-open coastal lagoon in Ghana, lies between the cities of Accra and Tema. The lagoon and its surrounding wetland were designated a Ramsar Site in 1992, mainly because it served as a refuge for 66 local and migratory bird species. Its ecology, and the biology of its major fish species, notably the blackchin tilapia (Sarotherodon melanotheron) were thoroughly studied in 1971, when the lagoon was a diverse, mainly brackish ecosystem supporting a traditionally and well-managed fishery. In 2016-2017, another study found the lagoon mostly covered by floating vegetation and plastic waste. Finally, in 2024, a visual survey established that the floating vegetation had been almost completely replaced by terrestrial plants, with only a few square meters of garbage-strewn water in front of a culvert connecting the lagoon to the open sea. Several lagoons along the coast of Ghana have been similarly lost to urban sprawl and its various forms of pollution, but Sakumo Lagoon is a Ramsar Site, and its imminent disappearance should not remain undocumented.

15

CuGen: A GPU-accelerated framework for large-scale genomics

Kiiskinen, T.; Richland, J.; Wang, W.; Lu, W. S.; Balasubramanian, N.; Hastie, T.; Tibshirani, R.; Rivas, M. A.

2026-07-17 genetic and genomic medicine 10.64898/2026.07.15.26358178 medRxiv

Top 5%

0.1%

Show abstract

Biobank-scale genomic analyses remain computationally expensive, CPU-bound workflows, particularly when adjusting for confounding. Here, we present CuGen, a GPU-accelerated framework for large-scale genomics. CuGen uses UltraLasso, a novel hierarchical application of univariate-guided sparse regression (uniLasso), to select a compact, phenotype-informed active set of fewer than 30,000 variants. This achieves robust leave-one-chromosome-out (LOCO) confounding control, enabling both downstream GWAS and in-sample fine-mapping. Additionally, we introduce the .cugen file format, a genotype representation designed for memory-optimized, high-throughput streaming and random access on GPU hardware. Building on this substrate, we provide a general GPU-accelerated genomics toolkit handling polygenic prediction, data manipulation, quality control, analysis, and visualization. We demonstrate CuGen's efficacy in the UK Biobank with up to 408,624 individuals, where the full GWAS pipeline and fine-mapping against 6.8 million imputed variants completes in approximately 10 minutes on a single high-throughput GPU with 80 GB of memory. The pipeline scales efficiently to massive phenome-wide analyses with sublinear resource consumption.

16

Knowledge and misconceptions of the French population regarding medical genetics: a survey of 3,000 respondents

MERCIER, S.; PETIT, F.; MISRAHI, M.; BERTA, P.; CAMBON-THOMSEN, A.; CHAUMETTE, B.; CHNEIWEISS, H.; CRETOLLE, C.; EDERY, P.; HEARD, D.; KONYUKH, M.; LAENG, C.; MAHLAOUI, N.; PASQUIER, L.; PLUTINO, M.; ODENT, S.; STOPPA-LYONNET, D.; "Genetics and the General Public" FFGH Ethics Working Group,

2026-07-19 genetic and genomic medicine 10.64898/2026.07.17.26358259 medRxiv

Top 6%

0.1%

Show abstract

Advances in high-throughput sequencing and genetic research have expanded the role of genetics in medicine and society. Population-based screening programs, including neonatal and preconception testing, are increasingly implemented globally, alongside the rise of direct-to-consumer (DTC) genetic testing. The "Genetics and the General Public" Ethics Working Group of the French Federation of Human Genetics (FFGH) assessed knowledge and awareness of genetics within the French population through a nationally representative survey (n=3,013) conducted by the polling firm Ipsos bva. Results indicated that 69% of respondents report an interest in genetics, although their level of knowledge remains limited. Most respondents expressed positive attitudes toward genetics, perceiving it as a major source of hope in healthcare. While a majority indicated willingness to undergo genetic testing for medical purposes, they also reported legitimate concerns regarding the potential results. Despite legal restrictions, 12% reported having ordered a DTC genetic test (5% for genealogical; 5% for medical and 2% for both purposes), and 45% of non-users expressed strong interest in this type of test. Notably, there is a substantial lack of awareness regarding the limitations of these tests and the French legal framework governing their use. These findings highlight critical gaps in public knowledge, emphasizing the need for improved genetic education, including incorporating genetics into school curricula and launching targeted awareness campaigns. These initiatives should help clarify the distinctions between clinically validated genetic tests and DTC genetic testing services, addressing both their benefits and their ethical, legal, and scientific limitations, in order to promote informed decision-making.

17

A Multimodal Benchmark for Evaluating Cause-of-Death Inference Using Child Health and Mortality Data

Yang, J.; Pan, S.; Lim, H. S.; Chu, Y.; Guo, Y.; Agarwal, N.; Babbar, V.; Parikh, G. R.; Chen, Y. T.; Rees, C. A.; Dangor, Z.; Lala, S. G.; Li, Z. R.; Clark, S. J.; Wu, Z.; Datta, A.; Liu, L.; Rudin, C.; Scarpino, S. V.; Gyori, B. M.; McCormick, T. H.

2026-07-15 public and global health 10.64898/2026.07.13.26357980 medRxiv

Top 6%

0.1%

Show abstract

Accurately attributing causes of death is vital for global health, yet fewer than 5% of deaths in resource-constrained regions are medically certified. To assign causes to these unlabeled deaths at scale, practitioners traditionally rely on verbal autopsy, using supervised statistical models to classify based on structured survey data. However, modern mortality surveillance increasingly collects rich, unstructured multimodal data, such as free-text caregiver narratives and postmortem diagnostics, which traditional supervised statistical models struggle to seamlessly integrate. In this paper, we present a comprehensive, multimodal benchmark for cause-of-death classification using data from the Child Health and Mortality Prevention Surveillance (CHAMPS) network, a unique surveillance platform spanning nine countries across South Asia and Sub-Saharan Africa. Using this dataset, we introduce an evaluation framework designed to rigorously assess diagnostic reasoning, moving beyond traditional metrics that fail to capture complex clinical realities. We demonstrate the utility of this benchmark by evaluating zero-shot large language models against supervised baselines across various data modalities. Our results reveal distinct differences in how these modeling approaches synthesize unstructured medical evidence. This benchmark provide a rigorously defined resource for assessing clinical reasoning in next-generation mortality surveillance.

18

Hidden Structural Bias in Proteomics: Sonication-induced Selective Fragmentation of Intrinsically Disordered Regions

Narita, M.; Yamakawa, T.; Nishimura, R.; Iwasaki, M.

2026-07-15 cell biology 10.64898/2026.07.14.738389 medRxiv

Top 6%

0.1%

Show abstract

Sonication is a fundamental technique in proteome sample preparation, primarily used for protein solubilization and shearing of genomic DNA. Although the mechanical shearing of DNA is well-characterized, its unintended impact on protein structural integrity remains a significant "blind spot" in high-throughput analytical workflows. In this study, we systematically investigated sonication-induced protein fragmentation by combining gel-based fractionation (PEPPI-MS) with sequence-level compositional analysis and bioinformatic mapping. Our results demonstrate that sonication does not significantly alter overall proteome identification or the recovery of membrane proteins; however, it induces extensive and non-random protein fragmentation. Sonication caused an approximately three-fold increase in the abundance of >45 kDa protein-derived fragments migrating into the <40 kDa fraction, and 1,620 high-molecular-weight (MW) proteins were uniquely detected in the lower-MW fraction upon sonication, an eight-fold increase over non-sonicated controls. Peptide-level amino acid composition analysis revealed subtle but directional shifts in the sonication-derived fragments. This residue-level signature is reinforced by two orthogonal structural analyses (MobiDB peptide-level mapping and protein-level profiling using metapredict V3 software), which show that sonication-susceptible proteins harbor more than twice the disordered content of length-matched controls (median 40% vs. 18%). This study identifies a previously unrecognized "structural bias" whereby intrinsically disordered region (IDR)-rich proteins are selectively compromised during sample preparation. Because these fragments are indistinguishable from enzymatic digestion products in conventional bottom-up proteomics, the underlying structural damage is effectively masked in global quantitative datasets, potentially distorting biological interpretations related to protein size, isoforms, and stability, particularly for IDR-rich classes, such as transcription factors and signaling molecules. We propose that optimizing and standardizing sonication parameters is essential for ensuring the accuracy and reproducibility of quantitative proteomic analyses.

19

Analytical Performance and 99th Percentile Upper Reference Limit of the Novel SPINCHIP High-Sensitivity Cardiac Troponin I Point-of-Care Assay

MacKenzie, J.; Aakre, K. M.; Paus, D.; Broughton, M. N.; Storvold, G. L.; Olberg, A.; Stenmark, S.; Booij, B. B.; Scott, S.; Michel-Busseret, S.; Octave, L.; Tveit, A.; Lyngbakken, M. N.; Nilsson, J.; Rosjo, H.

2026-07-20 emergency medicine 10.64898/2026.07.17.26357157 medRxiv

Top 6%

0.1%

Show abstract

BACKGROUND In line with International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) recommendations for high-sensitivity cardiac troponin assays, analytical validation and reference limit assessments are required to confirm that an assay meets performance criteria. This study evaluated the analytical performance and established the 99th percentile upper reference limit (URL) for the SPINCHIP High-Sensitivity Cardiac Troponin I (SPINCHIP hs-cTnI) point-of-care assay. METHODS Analytical performance characteristics, including the limit of blank (LoB), limit of detection (LoD), and limit of quantification (LoQ), were assessed. Additionally, 1,053 plasma samples and 1,055 whole-blood samples were used to determine the URL. Imprecision around the 99th percentile URL was evaluated as part of the analytical validation. High-sensitivity criteria were assessed by confirming measurable cTnI in [≥]50% of healthy individuals (n=432 plasma; n=431 whole blood) and achieving imprecision <10% at the 99th percentile (plasma, n=960; whole blood, n=480). RESULTS SPINCHIP hs-cTnI demonstrated a LoB of 0.3 ng/L; LoDs of 0.8 ng/L (plasma) and 0.9 ng/L (whole blood); and LoQs of 1.1 ng/L (plasma) and 1.4 ng/L (whole blood). The analytical measuring range was 1.1-9,000 ng/L. Imprecision at the common 99th percentile URL (14 ng/L) was 5.8%; for men (URL=16 ng/L) 5.6% and for women (URL=10 ng/L) 6.3%. Greater than 85.2% (94.0% and 76.1% in men and women, respectively) of healthy individuals showed measurable cTnI above the LoD. CONCLUSIONS The SPINCHIP hs-cTnI assay meets the IFCC high-sensitivity requirements, demonstrating <10% imprecision at the 99th percentile, reliable low-concentration precision and cTnI detection in more than half of healthy individuals.

20

FoodScribe: an open-source semantic framework for nutrient estimation from free-text dietary records

Gouda, H.; Sala Climent, M.; Agongo, J.; Gaikwad, S. P.; Nattakom, A.; Zhao, H. N.; Xing, S.; Boland, B. S.; Holt, T.; Guma, M.; Dorrestein, P. C.

2026-07-17 nutrition 10.64898/2026.07.15.26358181 medRxiv

Top 6%

0.1%

Show abstract

Efficiently summarizing dietary records at scale remains a persistent bottleneck in nutritional epidemiology. We present FoodScribe, which translates free-text meal descriptions into quantitative nutrient profiles by combining ingredient parsing with nutrient retrieval by querying the USDA FoodData Central (FDC) database. Benchmarked using three LLM providers using Nutribench dataset, FoodScribe completed annotation of 3,807 meal descriptions in 2.5 hours, a task otherwise requiring substantial manual effort from trained nutritionists. FoodScribe achieved accuracy across macronutrient estimation (F1=0.79-0.89), with models performing better for protein than fat estimation. Application to a Mediterranean diet intervention cohort indicated dietary shifts consistent with the intervention pattern based on model-derived estimates. Integration with metabolomics data suggested that fiber and vegetable intake were positively associated with a fecal metabolite cluster.